Inverse Reinforcement Learning via Ranked and Failed Demonstrations
نویسنده
چکیده
In many robotics applications, applying reinforcement learning (RL) can be especially difficult, as it depends on the prespecification of a reward function over the environment’s states, which is often hard to define. Inverse Reinforcement Learning (IRL) [1] attempts to address this problem, by utilizing human demonstrations to learn the reward function, without having a human explicitly define it. However, IRL has the underlying assumption that the human demonstrator performs optimally, which can be an unrealistic assumption in extremely complex tasks. Attempts to mitigate this optimal demonstration assumption have been proposed, such as active learning and interactive corrections, but these methods often assume that task execution can be paused, or that the demonstrator can provide the optimal action when queried. With the goal of developing an IRL framework that can perform as successfully as the current state-of-the-art algorithms in reinforcement learning, we introduce a new algorithm, IRLFR, that combines inverse reinforcement learning from failure (IRLF) and inverse reinforcement learning from ranked demonstrations to create a more general framework that can address difficult problems in robot learning with suboptimal demonstrations, so that robots can perform at a superhuman level while minimizing the need for interactive corrections and active learning. We validate our approach in a gridworld domain, where it is nonintuitive for humans to play optimally, and where existing reinforcement learning algorithms can easily find the optimal policy. Our results show that IRLR and IRLFR are frameworks that on average, perform better than the MEIRL baseline performance.
منابع مشابه
Inverse Reinforcement Learning from Failure
Inverse reinforcement learning (IRL) allows autonomous agents to learn to solve complex tasks from successful demonstrations. However, in many settings, e.g., when a human learns the task by trial and error, failed demonstrations are also readily available. In addition, in some tasks, purposely generating failed demonstrations may be easier than generating successful ones. Since existing IRL me...
متن کاملActive Learning from Critiques via Bayesian Inverse Reinforcement Learning
Learning from demonstration algorithms, such as Inverse Reinforcement Learning, aim to provide a natural mechanism for programming robots, but can often require a prohibitive number of demonstrations to capture important subtleties of a task. Rather than requesting additional demonstrations blindly, active learning methods leverage uncertainty to query the user for action labels at states with ...
متن کاملNonlinear Inverse Reinforcement Learning with Gaussian Processes
We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonli...
متن کاملImitation and Reinforcement Learning from Failed Demonstrations
Current work in robotic imitation learning uses successful demonstrations of a task performed by a human teacher to initialize a robot controller. Given a reward function, this learned controller can then be improved using techniques derived from reinforcement learning. We instead use failed attempts, which may be more plentiful, to initialize our controller and, taking them as illustrations of...
متن کاملInverse Reinforcement Learning via Deep Gaussian Process
We propose a new approach to inverse reinforcement learning (IRL) based on the deep Gaussian process (deep GP) model, which is capable of learning complicated reward structures with few demonstrations. Our model stacks multiple latent GP layers to learn abstract representations of the state feature space, which is linked to the demonstrations through the Maximum Entropy learning framework. Inco...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016